Week 4: Unary and Binary Operations

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2025

Jeff Jacobs

jj1088@georgetown.edu

Tuesday, September 17, 2024

Quick Logistical Things

  • HW1 Question 1 Clarifications
  • “Public Tests” vs. Hidden Tests

HW1 Question 1 Generally

  • “Simplest” / Most Efficient Representation

    “Indicate which of the seven geometries above would provide the simplest representation of the entity”

  • GEOMETRYCOLLECTION could represent any of the geographic entities in Q1, but would be overkill for representing e.g. a single point or line

HW1 Question 1.8 Specifically

For the United Arab Emirates (UAE) data… we have what we call the Not-Paraguay-Problem: Most countries, including the UAE, have a bunch of lil “pieces”:

Code
ru_national_map <- ne_countries(type = "countries", country = "Russia", scale = "medium", returnclass = "sf")
mapview(ru_national_map)
Code
uae_national_map <- ne_countries(type = "countries", country = "United Arab Emirates", scale = "large", returnclass = "sf")
mapview(uae_national_map)

So, for Q1.8: Assume we’re just trying to have the computer represent the “mainland” (the biggest contiguous landmass, with Dubai on it!)

Public vs. Hidden Tests

  • Public Tests are basically “Quality Assurance Test”
  • Hidden Tests check for correctness

Coordinate Reference Systems

  • First things first… what are coordinates?
  • We need a way to unambiguously specify where things are on the earth

Latitude and Longitude are Angles!

From Krygier and Wood (2016)

Angular Distance vs. Travel Distance

The Earth’s “width” is slightly greater than its “length” 😰

From Wikimedia Commons

Projections

Smooshing 3D into 2D

From Monmonier (2018)

Avoid Getting Lost in the Sauce

How To Avoid Getting Lost in the Sauce

Tissot Circles: Imagine infinitely small ellipses placed at regular intervals on the curved surface of the earth. Imagine these ellipses being projected along with the earth’s surface. When scaled up, changes in the ellipses show the location and quality of distortions on the projected map.

Choropleths for Good and Evil

  • Crucial crucial aspect of GIS: Having Domain Knowledge of a Region

The US’s Ur-Choropleth #1: Population

Kieran Healy, “America’s Ur-Choropleths”

The US’s Ur-Choropleth #2: Race

Kieran Healy, “America’s Ur-Choropleths”

Crime in Mongolia

From Reddit

Population of Mongolia

From Wikimedia Commons

Exhibit A

From Monmonier (2018)

Exhibit B

From Monmonier (2018)

Continuous Choropleth

Is poverty a “significant issue” in the US?

From Krygier and Wood (2016)

Quantile Colormap

Is poverty a “significant issue” in the US?

Assigns the same number of observations to each color

From Krygier and Wood (2016)

Equal-Area Colormap

Is poverty a “significant issue” in the US?

Boundaries between colors come at regular (equal) intervals

From Krygier and Wood (2016)

Natural-Break Colormap

Is poverty a “significant issue” in the US?

Clustering algorithm chooses classes to (a) minimize differences within classes, (b) maximize differences between classes

From Krygier and Wood (2016)

Context-Sensitive Colormap

Is poverty a “significant issue” in the US?

A government program offers special funding for counties with above 25% poverty

From Krygier and Wood (2016)

The Importance of History

Is poverty a “significant issue” in the US?

From Krygier and Wood (2016)

Geospatial Operations 1: Unary Operations

Getting the Geometries

Using rnaturalearth with mapview

Code
set.seed(6805)
library(tidyverse)
library(sf)
library(rnaturalearth)
library(mapview)
france_sf <- ne_countries(country = "France", scale = 50)
(france_map <- mapview(france_sf, label = "geounit", legend = FALSE))

Centroid of France

Code
france_cent_sf <- sf::st_centroid(france_sf)
france_map + mapview(france_cent_sf, label = "Centroid", legend = FALSE)

One We Already Saw: Union

Computing the union of all geometries in the sf via sf::st_union()

Code
library(leaflet.extras2)
africa_sf <- ne_countries(continent = "Africa", scale = 50)
africa_union_sf <- sf::st_union(africa_sf)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
africa_union_map <- mapview(africa_union_sf, label="st_union(africa)", legend=FALSE)
africa_map | africa_union_map

Helpful for Rasterizing: BBox

Code
africa_bbox_sf <- sf::st_bbox(africa_sf)
africa_bbox_map <- mapview(africa_bbox_sf, label="st_bbox(africa)", legend=FALSE)
africa_map | africa_bbox_map

Convex Hulls by Country

Code
africa_countries_cvx <- sf::st_convex_hull(africa_sf)
africa_countries_cvx_map <- mapview(africa_countries_cvx, label="geounit", legend=FALSE)
africa_map | africa_countries_cvx_map

Convex Hull of Continent

Use st_union() first:

Code
africa_cvx <- africa_sf |> st_union() |> st_convex_hull()
africa_cvx_map <- mapview(africa_cvx, label="geounit", legend=FALSE)
africa_map | africa_cvx_map

One We Already Saw: Centroids

Computing the centroid of all geometries in the sf via sf::st_centroid()

Code
africa_cents_sf <- sf::st_centroid(africa_sf)
africa_cents_map <- mapview(africa_cents_sf, label="geounit", legend=FALSE)
africa_map | africa_cents_map

Geospatial Operations 2: Binary Operations

Spatial Joins

Code
nc <- system.file("shape/nc.shp", package="sf") |>
  read_sf() |>
  st_transform('EPSG:2264')
gr <- st_sf(
         label = apply(expand.grid(1:10, LETTERS[10:1])[,2:1], 1, paste0, collapse = ""),
         geom = st_make_grid(nc))
gr$col <- sf.colors(10, categorical = TRUE, alpha = .3)
# cut, to verify that NA's work out:
gr <- gr[-(1:30),]
suppressWarnings(nc_j <- st_join(nc, gr, largest = TRUE))
par(mfrow = c(2,1), mar = rep(0,4))
plot(st_geometry(nc_j), border = 'grey')
plot(st_geometry(gr), add = TRUE, col = gr$col)
text(st_coordinates(st_centroid(st_geometry(gr))), labels = gr$label, cex = .85)
# the joined dataset:
plot(st_geometry(nc_j), border = 'grey', col = nc_j$col)
text(st_coordinates(st_centroid(st_geometry(nc_j))), labels = nc_j$label, cex = .7)
plot(st_geometry(gr), border = '#88ff88aa', add = TRUE)

Spatial Sampling

Code
# Sample random points
africa_points_list <- sf::st_sample(africa_union_sf, 10)
africa_points_sf <- sf::st_sf(africa_points_list)
africa_points_map <- mapview(africa_points_sf, label="Random Point", col.regions=cb_palette[1], legend=FALSE)
africa_map + africa_points_map

The “Default” Predicate: st_intersects

Code
countries_w_points <- africa_sf[africa_points_sf,]
mapview(countries_w_points, label="geounit", legend=FALSE) + africa_points_map

Counting with lengths()

Code
country_inter <- sf::st_intersects(africa_sf, africa_points_sf)
# Computes point counts for each polygon
(num_intersections <- lengths(country_inter))
 [1] 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2
Code
africa_sf <- africa_sf |> mutate(
  num_points = num_intersections
) |> arrange(geounit)
africa_sf |> select(geounit, num_points) |> head()
geounit num_points geometry
Algeria 2 MULTIPOLYGON (((8.576563 36…
Angola 2 MULTIPOLYGON (((13.07275 -4…
Benin 0 MULTIPOLYGON (((1.622656 6….
Botswana 0 MULTIPOLYGON (((25.25879 -1…
Burkina Faso 0 MULTIPOLYGON (((0.9004883 1…
Burundi 0 MULTIPOLYGON (((30.55361 -2…

Plotting with mapview

Code
mapview(africa_sf, zcol="num_points")

Plotting with ggplot2

Since we’re starting to get into data attributes rather than geometric features, switching to ggplot2 is recommended!

Code
africa_sf |> ggplot(aes(fill=num_points)) +
  geom_sf() +
  theme_classic()

One More Unary Operation: st_buffer()

  • Think about how you might answer questions like:
    • “How far is your house (POINT) from Manhattan (POLYGON)?”
    • “Are there any chemical plants within a mile of this building (POLYGON) / stretch of road (LINESTRING)?”
  • Lazy mode (my favorite mode): Compute distances from the centroid
  • GIS master mode: Construct the buffer!

On POLYGONs

Key line: manhattan_buffer_sf <- manhattan_union_sf |> st_buffer(dist = 1609.34) (1 mile \(\approx\) 1609.34 meters)

Code
library(tidyverse)
library(sf)
library(tidycensus)
library(tigris)
library(mapview)
options(tigris_use_cache = TRUE)
manhattan_sf <- get_acs(
  geography = "tract",
  variables = "B19013_001",
  state = "NY",
  county = "New York",
  year = 2020,
  geometry = TRUE,
  cb = FALSE
)
# Erase the island tracts real quick
island_tracts <- c(
  "Census Tract 1, New York County, New York",
  "Census Tract 2.02, New York County, New York"
)
manhattan_sf <- manhattan_sf |> filter(
  !(NAME %in% island_tracts)
)
# Union of all census tracts within the county
manhattan_union_sf <- st_union(manhattan_sf)
manhattan_union_map <- mapview(manhattan_union_sf, label="New York County")
# Construct buffer (1 mile ~= 1609.34 meters)
manhattan_buffer_sf <- manhattan_union_sf |> st_buffer(dist = 1609.34)
manhattan_buffer_map <- mapview(manhattan_buffer_sf, label="Buffer (1 Mile)", col.regions = cbPalette[1], legend = TRUE)
manhattan_buffer_map + manhattan_union_map

On LINESTRINGs

Code
library(tidyverse)
library(sf)
## st_buffer, style options (taken from rgeos gBuffer)
l1 = st_as_sfc("LINESTRING(0 0,1 5,4 5,5 2,8 2,9 4,4 6.5)")
op = par(mfrow=c(2,3))
plot(st_buffer(l1, dist = 1, endCapStyle="ROUND"), reset = FALSE, main = "endCapStyle: ROUND")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, endCapStyle="FLAT"), reset = FALSE, main = "endCapStyle: FLAT")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, endCapStyle="SQUARE"), reset = FALSE, main = "endCapStyle: SQUARE")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs=1), reset = FALSE, main = "nQuadSegs: 1")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs=2), reset = FALSE, main = "nQuadSegs: 2")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs= 5), reset = FALSE, main = "nQuadSegs: 5")
plot(l1,col='blue',add=TRUE)

From the sf Documentation

What Makes Binary Operations “Fancier”?

Unary
  • st_centroid()
    • POLYGON \(\mapsto\) POINT
    • MULTIPOLYGON \(\mapsto\) POINT
  • st_convex_hull()
    • POLYGON \(\mapsto\) POLYGON
    • MULTIPOINT \(\mapsto\) POLYGON
Binary
  • st_intersection()
    • (POINT, POINT) \(\mapsto\) POINT | POINT EMPTY
    • (POLYGON, POLYGON) \(\mapsto\) POLYGON | LINESTRING | POINT | POLYGON EMPTY
  • st_is_empty() and st_dimension() become your new best friends 😉
  • st_is_empty(): Distinguishes between, e.g., POINT EMPTY and POINT(0 0)
  • st_dimension(): NA for empty versions, otherwise
    • 2 for surfaces (POLYGON, MULTIPOLYGON)
    • 1 for lines (LINESTRING, MULTILINESTRING)
    • 0 for points (POINT, MULTIPOINT)

The Bad Kind of Overthinking: Will My Life Just Get Harder and Harder?

Unary Operations

Binary Operations

Ternary Operations

Quaternary Operations

Good News and Bad News

  • The good news: No!
  • The bad news: You’ll have to read the 465-page Volume I and then the 451-page Volume II and then to page 15 of Volume III of Cohn (1965) to know why:

(i spent 4 yrs of undergrad studying abstract algebra and now it all sits gathering dust somewhere deep within my brain plz just let me have this moment i’ll never mention it again i promise)

The Good Kind of Overthinking…

  • For fancier geospatial operations, we’ll need to start overthinking, about the possible relationships between two (or more) geometries! \(\leadsto\) Relational Predicates:

Figure 4.2 in Lovelace, Nowosad, and Muenchow (2024)

Ad Hoc / Ambiguous \(\rightarrow\) Precise: Enter DE-9IM!

DE-9IM Strings

Each cell here visualizes one component of the DE-9IM string 1020F1102, which describes the relationship between the following two geometries:

  • Boxey McBoxface: POLYGON(0 0, 1 0, 1 1, 0 1, 0 0)
  • Liney McLineface: LINESTRING(0.5 -0.5, 0.5 0.5)
Code
library(sf)
polygon <- po <- st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))))
p0 <- st_polygon(list(rbind(c(-1,-1), c(2,-1), c(2,2), c(-1,2), c(-1,-1))))
line <- li <- st_linestring(rbind(c(.5, -.5), c(.5, 0.5)))
s <- st_sfc(po, li)

par(mfrow = c(3,3))
par(mar = c(1,1,1,1))

# "1020F1102"
# 1: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"I(line) = 1")))
lines(rbind(c(.5,0), c(.5,.495)), col = 'red', lwd = 2)
points(0.5, 0.5, pch = 1)

# 2: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"B(line) = 0")))
points(0.5, 0.5, col = 'red', pch = 16)

# 3: 2
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"E(line) = 2")))
plot(po, col = '#ff8888', add = TRUE)
plot(s, col = c(NA, 'darkgreen'), border = 'blue', add = TRUE)

# 4: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"I(line) = 0")))
points(.5, 0, col = 'red', pch = 16)

# 5: F
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"B(line) = F")))

# 6: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"E(line) = 1")))
plot(po, border = 'red', col = NA, add = TRUE, lwd = 2)

# 7: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"I(line) = 1")))
lines(rbind(c(.5, -.5), c(.5, 0)), col = 'red', lwd = 2)

# 8: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"B(line) = 0")))
points(.5, -.5, col = 'red', pch = 16)

# 9: 2
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"E(line) = 2")))
plot(p0 / po, col = '#ff8888', add = TRUE)
plot(s, col = c(NA, 'darkgreen'), border = 'blue', add = TRUE)

Code from Pebesma and Bivand (2023)

  • The predicate equals corresponds to the DE-9IM string "T*F**FFF*". If any two geometries obey this relationship, they are (topologically) equal!

Slowing Down: 9IM (no DE yet!)

9IM Interior Boundary Exterior
Interior
 

 

 
Boundary
 

 

 
Exterior
 

 

 
Table 1: From OSGeo Project

Dimensionally Extended (DE) 9IM

9IM Interior Boundary Exterior
Interior
2

1

2
Boundary
1

0

1
Exterior
2

1

2
Table 2: From OSGeo Project

Crunching it Down into a Tiny Box

DE-9IM Interior Boundary Exterior
Interior 2 1 2
Boundary 1 0 1
Exterior 2 1 2

And Then into a Tiny String

212101212

And Then into an Infinitesimally-Small Point

DE-9IM Masks

  • Now terms can be given unambiguous, precise meaning!
st_overlaps() Interior Boundary Exterior
Interior T * T
Boundary * * *
Exterior T * *
  • Special Values (besides 0, 1, 2):
    • T: “True” (non-empty, st_dimension() >= 0)
    • F: “False” (empty, st_dimension() == NA)
    • *: “Wildcard” (Don’t care what the value is)
  • st_overlaps(): T*T***T**, st_equals(): T*F**FFF*

DE-9IM vs. Everyday Language

  • DE-9IM can (in theory) represent \(6^9 = 10~077~696\) possible geometric relationships!
  • The English language has like 10, and they’re ambiguous ☠️ (Compromise employed by GIS systems: allow multiple masks for same English word:)
English Mask 212101212 Result
“Disjoint” FF*FF**** FALSE x not disjoint from y
“Touches” FT******* FALSE x doesn’t touch y
“Touches” F***T**** FALSE x doesn’t touch y
“Crosses” T*T***T** TRUE x crosses y
“Within” TF*F***** FALSE x is not within y
“Overlaps” T*T***T** TRUE x overlaps y

st_relate(): The Ultimate Predicate

Code
library(tidyverse)
library(rnaturalearth)
us <- ne_states(country="United States of America")
dc <- us |> filter(iso_3166_2 == "US-DC")
us <- us |>
  mutate(
    de9im = st_relate(us, dc),
    touch = st_touches(us, dc, sparse = F)
  ) |>
  select(iso_3166_2, name, de9im, touch) |>
  arrange(name)
us
iso_3166_2 name de9im touch geometry
US-AL Alabama FF2FF1212 FALSE MULTIPOLYGON (((-87.41958 3…
US-AK Alaska FF2FF1212 FALSE MULTIPOLYGON (((-141.0056 6…
US-AZ Arizona FF2FF1212 FALSE MULTIPOLYGON (((-111.0063 3…
US-AR Arkansas FF2FF1212 FALSE MULTIPOLYGON (((-90.30422 3…
US-CA California FF2FF1212 FALSE MULTIPOLYGON (((-114.7243 3…
US-CO Colorado FF2FF1212 FALSE MULTIPOLYGON (((-109.0463 4…
US-CT Connecticut FF2FF1212 FALSE MULTIPOLYGON (((-73.6417 41…
US-DE Delaware FF2FF1212 FALSE MULTIPOLYGON (((-75.05809 3…
US-DC District of Columbia 2FFF1FFF2 FALSE MULTIPOLYGON (((-77.02293 3…
US-FL Florida FF2FF1212 FALSE MULTIPOLYGON (((-87.44734 3…
US-GA Georgia FF2FF1212 FALSE MULTIPOLYGON (((-80.89029 3…
US-HI Hawaii FF2FF1212 FALSE MULTIPOLYGON (((-154.8996 1…
US-ID Idaho FF2FF1212 FALSE MULTIPOLYGON (((-117.0382 4…
US-IL Illinois FF2FF1212 FALSE MULTIPOLYGON (((-89.1237 36…
US-IN Indiana FF2FF1212 FALSE MULTIPOLYGON (((-84.80608 4…
US-IA Iowa FF2FF1212 FALSE MULTIPOLYGON (((-96.48266 4…
US-KS Kansas FF2FF1212 FALSE MULTIPOLYGON (((-102.0396 3…
US-KY Kentucky FF2FF1212 FALSE MULTIPOLYGON (((-89.42446 3…
US-LA Louisiana FF2FF1212 FALSE MULTIPOLYGON (((-89.52599 3…
US-ME Maine FF2FF1212 FALSE MULTIPOLYGON (((-71.08495 4…
US-MD Maryland FF2F11212 TRUE MULTIPOLYGON (((-75.64786 3…
US-MA Massachusetts FF2FF1212 FALSE MULTIPOLYGON (((-71.19396 4…
US-MI Michigan FF2FF1212 FALSE MULTIPOLYGON (((-84.4913 46…
US-MN Minnesota FF2FF1212 FALSE MULTIPOLYGON (((-97.22609 4…
US-MS Mississippi FF2FF1212 FALSE MULTIPOLYGON (((-88.40221 3…
US-MO Missouri FF2FF1212 FALSE MULTIPOLYGON (((-95.31725 4…
US-MT Montana FF2FF1212 FALSE MULTIPOLYGON (((-116.0482 4…
US-NE Nebraska FF2FF1212 FALSE MULTIPOLYGON (((-104.0537 4…
US-NV Nevada FF2FF1212 FALSE MULTIPOLYGON (((-114.0425 4…
US-NH New Hampshire FF2FF1212 FALSE MULTIPOLYGON (((-71.50585 4…
US-NJ New Jersey FF2FF1212 FALSE MULTIPOLYGON (((-75.54133 3…
US-NM New Mexico FF2FF1212 FALSE MULTIPOLYGON (((-108.1375 3…
US-NY New York FF2FF1212 FALSE MULTIPOLYGON (((-79.06523 4…
US-NC North Carolina FF2FF1212 FALSE MULTIPOLYGON (((-76.03173 3…
US-ND North Dakota FF2FF1212 FALSE MULTIPOLYGON (((-104.0476 4…
US-OH Ohio FF2FF1212 FALSE MULTIPOLYGON (((-80.52023 4…
US-OK Oklahoma FF2FF1212 FALSE MULTIPOLYGON (((-103.0002 3…
US-OR Oregon FF2FF1212 FALSE MULTIPOLYGON (((-124.4924 4…
US-PA Pennsylvania FF2FF1212 FALSE MULTIPOLYGON (((-79.76301 4…
US-RI Rhode Island FF2FF1212 FALSE MULTIPOLYGON (((-71.23686 4…
US-SC South Carolina FF2FF1212 FALSE MULTIPOLYGON (((-78.57316 3…
US-SD South Dakota FF2FF1212 FALSE MULTIPOLYGON (((-104.0567 4…
US-TN Tennessee FF2FF1212 FALSE MULTIPOLYGON (((-90.30422 3…
US-TX Texas FF2FF1212 FALSE MULTIPOLYGON (((-103.3115 2…
US-UT Utah FF2FF1212 FALSE MULTIPOLYGON (((-111.0502 4…
US-VT Vermont FF2FF1212 FALSE MULTIPOLYGON (((-73.35134 4…
US-VA Virginia FF2F11212 TRUE MULTIPOLYGON (((-76.01325 3…
US-WA Washington FF2FF1212 FALSE MULTIPOLYGON (((-122.753 48…
US-WV West Virginia FF2FF1212 FALSE MULTIPOLYGON (((-82.58945 3…
US-WI Wisconsin FF2FF1212 FALSE MULTIPOLYGON (((-87.80425 4…
US-WY Wyoming FF2FF1212 FALSE MULTIPOLYGON (((-109.0463 4…

(If You Don’t Want to Scroll)

Code
us |> filter(touch)
iso_3166_2 name de9im touch geometry
US-MD Maryland FF2F11212 TRUE MULTIPOLYGON (((-75.64786 3…
US-VA Virginia FF2F11212 TRUE MULTIPOLYGON (((-76.01325 3…

Doing Things with DE-9IM (Back to Binary Operations)

Almost a Spatial Join

Code
library(mapview)
library(leaflet.extras2)
africa_sf <- ne_countries(continent = "Africa", scale = 50) |> select(iso_a3, geounit)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
N <- 10
africa_union_sf <- sf::st_union(africa_sf)
sampled_points_sf <- sf::st_sample(africa_union_sf, N) |> sf::st_sf() |> mutate(temp = runif(N, 0, 100))
sampled_points_map <- mapview(sampled_points_sf, label="Random Point", col.regions=cbPalette[1], legend=FALSE)
countries_points_sf <- africa_sf[sampled_points_sf,]
filtered_map <- mapview(countries_points_sf, label="geounit", legend=FALSE) + sampled_points_map
(africa_map + sampled_points_map) | filtered_map

Spatial Filter \(\neq\) Spatial Join

The issue: Data attributes of POINTs are not merged into data attributes of POLYGONs

POINT Attributes
Code
st_geometry(sampled_points_sf) <- c("geom")
sampled_points_sf |> head()
geom temp
POINT (-5.135802 22.30373) 99.14195
POINT (48.01641 10.55753) 41.21422
POINT (47.30147 5.038318) 95.30175
POINT (22.64591 7.799242) 68.37119
POINT (7.739425 5.95728) 56.40943
POINT (22.13408 20.87522) 86.12631
POLYGON Attributes
Code
countries_points_sf |> head(4)
iso_a3 geounit geometry
2 ZMB Zambia MULTIPOLYGON (((30.39609 -1…
58 SOM Somalia MULTIPOLYGON (((41.53271 -1…
59 -99 Somaliland MULTIPOLYGON (((48.93857 11…
91 NGA Nigeria MULTIPOLYGON (((7.300781 4….

Our First Real Spatial Join: st_join()

Code
joined_sf <- countries_points_sf |> st_join(sampled_points_sf)
joined_sf |> head()
iso_a3 geounit temp geometry
2 ZMB Zambia 5.788125 MULTIPOLYGON (((30.39609 -1…
58 SOM Somalia 95.301751 MULTIPOLYGON (((41.53271 -1…
59 -99 Somaliland 41.214222 MULTIPOLYGON (((48.93857 11…
91 NGA Nigeria 56.409432 MULTIPOLYGON (((7.300781 4….
114 MLI Mali 99.141948 MULTIPOLYGON (((-11.3894 12…
123 LBY Libya 86.126311 MULTIPOLYGON (((9.51875 30….

But… We Were Still in Easy Mode

  • Every point could be matched one-to-one with a country. But what if… 😱
Code
g <- st_make_grid(st_bbox(st_as_sfc("LINESTRING(0 0,1 1)")), n = c(2,2))
par(mar = rep(0,4))
plot(g)
plot(g[1] * diag(c(3/4, 1)) + c(0.25, 0.125), add = TRUE, lty = 2)
text(c(.2, .8, .2, .8), c(.2, .2, .8, .8), c(1,2,4,8), col = 'red')

Spatially Intensive vs. Spatially Extensive

  • Extensive attributes: associated with a physical size (length, area, volume, counts of items). Ex: population count.
    • Associated with an area \(\implies\) if that area is cut into smaller areas, the population count needs to be split too
    • (At minimum, the sum of the population counts for the smaller areas needs to equal the total for the larger area)
  • Intensive attributes: Not proportional to support: if the area is split, values may vary but on average remain the same. Ex: population density
    • If an area is split into smaller areas, population density is not split similarly!
    • The sum of population densities for the smaller areas is a meaningless measure
    • Instead, the mean will be more useful as ~similar to the density of the total

Handling the Extensive Case

  • Assume the extensive attribute \(Y\) is uniformly distributed over a space \(S_i\) (e.g., for population counts we assume everyone is evenly-spaced across the region)

  • We first compute \(Y_{ij}\), derived from \(Y_i\) for a sub-area of \(S_i\), \(A_{ij} = S_i \cap T_j\):

    \[ \hat{Y}_{ij}(A_{ij}) = \frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

    where \(|\cdot|\) denotes area.

  • Then we can compute \(Y_j(T_j)\) by summing all the elements over area \(T_j\):

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

Handling the Intensive Case

  • Assume the variable \(Y\) has constant value over a space \(S_i\) (e.g., population density in assumed to be the same across all sub-areas)
  • Then the estimate for a sub-area is the same as the estimate for the total area:

\[ \hat{Y}_{ij} = Y_i(S_i) \]

  • So that we can obtain estimates of \(Y\) for new spatial units \(T_j\) via area-weighted average of the source values:

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|T_j|}Y_j(S_i) \]

References

Cohn, P. M. 1965. Universal Algebra. Springer Science & Business Media.
Krygier, John, and Denis Wood. 2016. Making Maps, Third Edition: A Visual Guide to Map Design for GIS. Guilford Publications.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2024. Geocomputation with R. CRC Press. https://r.geocompx.org/.
Monmonier, Mark. 2018. How to Lie with Maps. University of Chicago Press. https://www.dropbox.com/scl/fi/7rsqbxgge6llggnaf5tit/How-to-Lie-with-Maps-Third-Edition.pdf?rlkey=6rqxta7cjyq3oqdnskj4dtwj8&dl=1.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. CRC Press. https://r-spatial.org/book/.